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This paper presents a method that conbines a set of unsupervised algorithms in order to 
accurately build large taxonomies from any machine-readable dictionary (MRD). Our aim 
is to profit from conventional MRDs, with no explicit semantic coding. We propose a 
system that 1) performs fully automatic extraction of taxonomic links from MRD entries 
and 2) ranks the extracted relations in a way that selective manual refinement is allowed. 
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Domain descriptions should represent more than the characteristics of data and the 
operations on it. They should be "semantic" in the sense that they may represent 
information such as the meanings of special terms used in the business, as well as goals 
and rules. ER models are often described as "semantic data models". However, the 
correspondence between ER and natural language is through syntactic rather than 
through semantic constructs. Conceptual modeling languages and knowledge 
representatio ... 
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Lexical choice is a computationally complex task, requiring a generation system to 
consider a potentially large number of mappings between concepts and words. Constraints 
that aid in determining which word is best come from a wide variety of sources, including 
syntax, semantics, pragmatics, the lexicon, and the underlying domain. Furthermore, in 
some situations, different constraints come into play early on, while in others, they apply 
much later. This makes it difficult to determine a systemati ... 
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In this paper we propose an analysis and an upgrade of WordNet's top-level synset 
taxonomy. We briefly review Word Net and identify its main semantic limitations. Some 
principles from a forthcoming OntoClean methodology are applied to the ontological 
analysis of WordNet. A revised top-level taxonomy is proposed, which is meant to be 
more conceptually rigorous, cognitively transparent, and efficiently exploitable in several 
applications. 
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Technological advances in biomedical research are generating a plethora of heterogeneous 
data at a high rate. There is a critical need for extraction, integration and management 
tools for information discovery and synthesis from these heterogeneous data. In this 
paper, we present a general architecture, called ALFA, for information extraction and 
representation from diverse biological data. The ALFA architecture consists of: (i) a 
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Over the past few years, there have been a number of papers arguing the relative merits 
of primitives and prototypes as representations for the meaning of natural language. Much 
of the discussion has been both pugnacious and confused, with each author setting up one 
or another straw-man to knock down. Much of the confusion has resulted from a lack of 
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are several different ... 
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Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from 
the tradition of combining different knowledge sources in artificial in telligence research. 
An important step in the exploration of this hypothesis is to determine which linguistic 
knowledge sources are most useful and whether their combination leads to improved 
results. We present a sense tagger which uses several knowledge sources. Tested 
accuracy exceeds 94% on our evaluation corpus. Our system attempts ... 
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Much effort has been put into computational lexicons over the years, and most systems 
give much room to (lexical) semantic data. However, in these systems, the effort put on 
the study and representation of lexical items to express the underlying continuum existing 
in 1) language vagueness and polysemy, and 2) language gaps and mismatches, has 
remained embryonic. A sense enumeration approach fails from a theoretical point of view 
to capture the core meaning of words, let alone relate word meaning ... 



1 6 Research session Ln@w„appjicat ignsi The 
retrieval of heterogeneous XML and web documents 
Jens Graupmann, Ralf Schenkei, Gerhard Weikum 

August 2005 Proceedings of the 31st international conference on Very large data 
bases VLDB 05 

Publisher: VLDB Endowment 

Full text available: "j^ pdf(381.86 KB) Additional Information: lull citation, abstract, references, index terms 

This paper presents the novel SphereSearch Engine that provides unified ranked retrieval 
on heterogeneous XML and Web data. Its search capabilities include vague structure 
conditions, text content conditions, and relevance ranking based on IR statistics and 
statistically quantified ontological relationships. Web pages in HTML or PDF are 
automatically converted into XML format, with the option of generating semantic tags by 
means of linguistic annotation tools. For Web data the XML-oriented query ... 
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Natural language processing systems require three different types of lexicons: the concept 
lexicon that describes the (sub)world ontology and the analysis and generation lexicons 
for natural languages. We argue that the acquisition of the concept lexicon must precede 
any lexical work on natural language and that a comprehensive lexicon management 
system (LMS) is necessary for lexicon acquisition in large-scale applications. We describe 
the interactive concept lexicon acquisition module of the LM ... 
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This paper presents a method that conbines a set of unsupervised algorithms in order to 
accurately build large taxonomies from any machine-readable dictionary (MRD). Our aim 
is to profit from conventional MRDs, with no explicit semantic coding. We propose a 
system that 1) performs fully automatic extraction of taxonomic links from MRD entries 
and 2) ranks the extracted relations in a way that selective manual refinement is allowed. 
Tested accuracy can reach around 100% depending on the degree of ... 
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give much room to (lexical) semantic data. However, in these systems, the effort put on 
the study and representation of lexical items to express the underlying continuum existing 
in 1) language vagueness and polysemy, and 2) language gaps and mismatches, has 
remained embryonic. A sense enumeration approach fails from a theoretical point of view 
to capture the core meaning of words, let alone relate word meaning ... 
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In the fall of 1978 we decided to produce a special issue of the SIGART Newsletter 
devoted to a survey of current knowledge representation research. We felt that there 
were twe useful functions such an issue could serve. First, we hoped to elicit a clear 
picture of how people working in this subdiscipline understand knowledge representation 
research, to illuminate the issues on which current research is focused, and to catalogue 
what approaches and techniques are currently being developed. Secon ... 
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Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from 
the tradition of combining different knowledge sources in artificial in telligence research. 
An important step in the exploration of this hypothesis is to determine which linguistic 
knowledge sources are most useful and whether their combination leads to improved 
results. We present a sense tagger which uses several knowledge sources. Tested 
accuracy exceeds 94% on our evaluation corpus. Our system attempts ... 
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This paper presents an ongoing task that will construct a DAML+Oil-compliant Chinese 
Lexical Ontology. The ontology mainly comprises three components: a hierarchical 
taxonomy consisting of a set of concepts and a set of relations describing the relationships 
among the concepts, a set of lexical entries associated with the concepts and relations, 
and a set of axioms describing the constraints on the ontology. It currently contains 1,075 
concepts, 65,961 lexical entries associated with the concept ... 
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Verb alternations have been researched extensively in linguistics, but they have not yet 
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received a systematic treatment in natural language generation systems; consequently, 
generators cannot make informed choices among alternatives. As a step towards 
overcoming this discrepancy, we review some linguistic work on several prominent 
alternations, revise and extend it, and suggest a set of rules that allow the series of 
alternated forms to be produced from a single base form of the verb, the lexic ... 
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Technological advances in biomedical research are generating a plethora of heterogeneous 
data at a high rate. There is a critical need for extraction, integration and management 
tools for information discovery and synthesis from these heterogeneous data. In this 
paper, we present a general architecture, called ALFA, for information extraction and 
representation from diverse biological data. The ALFA architecture consists of: (i) a 
networked, hierarchical object model for representing information ... 
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information retrieval, interactive text mining, software architecture, user-guided 
information extraction 



Special section: Machine translation of natural languages 
Sergei Nirenburg 

April 1985 ACM SXGART Bulletin, issue 92 
Publisher: ACM Press 

Full text available: * ^pdff1.75MB) Additional Information: full citation, abstract, references 

The field of machine translation has recently entered a new, third period in its evolution. 
In its early period, for roughly fifteen years from 1950 MT was an expanding field of study 
in which both research and development efforts were undertaken. It is well-known and 
well documented (Bar Hillel, 1960; ALPAC, 1966) that this early MT paradigm could not 
and did not produce fully automated high quality translation systems. In fact, the practical 
results were quite negligible for such a high-scale ... 
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In this paper we propose an analysis and an upgrade of Word Net's top-level synset 
taxonomy. We briefly review WordNet and identify its main semantic limitations. Some 
principles from a forthcoming OntoClean methodology are applied to the ontological 
analysis of WordNet. A revised top-level taxonomy is proposed, which is meant to be 
more conceptually rigorous, cognitively transparent, and efficiently exploitable in several 
applications. 

Keywords: WordNet, ontology, taxonomies, top-level 
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Lexical choice is a computationally complex task, requiring a generation system to 
consider a potentially large number of mappings between concepts and words. Constraints 
that aid in determining which word is best come from a wide variety of sources, including 
syntax, semantics, pragmatics, the lexicon, and the underlying domain. Furthermore, in 
some situations, different constraints come into play early on, while in others, they apply 
much later. This makes it difficult to determine a systemati ... 
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Domain descriptions should represent more than the characteristics of data and the 
operations on it. They should be "semantic" in the sense that they may represent 
information such as the meanings of special terms used in the business, as well as goals 
and rules. ER models are often described as "semantic data models". However, the 
correspondence between ER and natural language is through syntactic rather than 
through semantic constructs. Conceptual modeling languages and knowledge 
representatio ... 
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Over the past few years, there have been a number of papers arguing the relative merits 
of primitives and prototypes as representations for the meaning of natural language. Much 
of the discussion has been both pugnacious and confused, with each author setting up one 
or another straw-man to knock down. Much of the confusion has resulted from a lack of 
agreement as to what it would mean for a system to use primitives or prototypes. There 
are several different ... 
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Word sense disambiguation algorithms, with few exceptions, have made use of only one 
lexical knowledge source. We describe a system which performs word sense 
disambiguation on all content words in free text by combining different knowledge 
sources: semantic preferences, dictionary definitions and subject/domain codes along with 
part-of-speech tags, optimised by means of a learning algorithm. We also describe the 
creation of a new sense tagged corpus by combining existing resources. Tested accura ... 
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This paper presents the novel SphereSearch Engine that provides unified ranked retrieval 
on heterogeneous XML and Web data. Its search capabilities include vague structure 
conditions, text content conditions, and relevance ranking based on IR statistics and 
statistically quantified ontological relationships. Web pages in HTML or PDF are 
automatically converted into XML format, with the option of generating semantic tags by 
means of linguistic annotation tools. For Web data the XML-oriented query ... 
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In statistical machine translation, correspondences between the words in the source and 
the target language are learned from parallel corpora, and often little or no linguistic 
knowledge is used to structure the underlying models. In particular, existing statistical 
systems for machine translation often treat different inflected forms of the same lemma 
as if they were independent of one another. The bilingual training data can be better 
exploited by explicitly taking into account the interdepend ... 

Results 1 - 20 of 200 Result page: 1 2 3 4 5 6 7 8 9 10 next 

The ACM Portal is published by the Association for Computing Machinery. Copyright © 2006 ACM, Inc. 

Terms of Usage Privacy Policy Code of Ethics Contact Us 

Useful downloads: HI Adpbe.Acrobat "^..Quicklirne lltMnjiQ^ RealPlayer 



http://portal.acm.org/resultsx^ 2/2/06 



Results (page 1): hierarchical lexicon dictionary 



Page 1 of 6 




USPTO 



Subscribe (Full Service) Register (Limited Service, Free) Login 
Search: <$ The ACM Digital Library O The Guide 



Feedback Report a problem Satisfaction 
survey 



Terms used hierarchical lexicon dictionary 



Sort results | re | evance H^T] ^ Save results to a Binder 

by I jgt 

ni . , 1 ^Search Tips 

Dlsp f y expanded form :: % » * 

results \ r. 1 ij Open results in a new 



Found 3,630 of 169,866 

Try an Advanced Search 

Try this search in TM. ACM. Guide. 



Results 1 - 20 of 200 
Best 200 shown 



window 

Result page: 1 2 3 4 5 6 7 8 9 10 next 

Relevance scale □ Q B I 



1 Statistical Machine Translation with Scarce Resources Using Morpho-svntactic 
IMormatjon. 

Sonja NieBen, Hermann Ney 

June 2004 Computational Linguistics, volume 30 issue 2 
Publisher: MIT Press 

Full text available: ^pdftMZ...19.K.B). Additional Information: ML^Mion, abstract 

In statistical machine translation, correspondences between the words in the source and 
the target language are learned from parallel corpora, and often little or no linguistic 
knowledge is used to structure the underlying models. In particular, existing statistical 
systems for machine translation often treat different inflected forms of the same lemma 
as if they were independent of one another. The bilingual training data can be better 
exploited by explicitly taking into account the interdepend ... 
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We describe the lexical knowledge base system (LKB) which has been designed and 
implemented as part of the ACQUILEX project 1 to allow the representation of multilingual 
syntactic and semantic information extracted from machine readable dictionaries (MRDs), 
in such a way that it is usable by natural language processing (NLP) systems. The LKB's 
lexical representation language (LRL) augments typed graph-based unification with 
default inheritance, formalised in terms of default unificatio ... 
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We are studying how to extract hierachical relation on verbs from definition sentences in a 
Japanese dictionary. The hierarchical relation on verbs has been dealt with as a binary 
relation on verbs, but it should be dealt with as logical relation on predicates. We will 
define the logical form of the hierarchical relation on verbs and then discuss which part of 
the syntactic structure of the definition sentence represents that relation. We will call the 
main predicate verb in this part the defini ... 
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Complete magnetic tape transcriptions of Webster's Seventh New Collegiate Dictionary 
(hereafter W7) and The New Merriam-Webster Pocket Dictionary (hereafter MPD) are now 
being examined by Olney, Reichert, Revard, and others as part of the Lexicographic 
Project (directed by Olney) at System Development Corporation. 1,2 Programs are being 
used or written to process these transcriptions in various ways, both automatically and 
interactively. ... 
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This paper describes the lexical database tool LOLA (Linguistic-Oriented Lexical database 
Approach) which has been developed for the construction and maintenance of lexicons for 
the machine translation system LMT. First, the requirements such a tool should meet are 
discussed, then LMT and the lexical information it requires, and some issues concerning 
vocabulary acquisition are presented. Afterwards the architecture and the components of 
the LOLA system are described and it is shown how we tried ... 
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We describe work toward the construction of a very wide-coverage probabilistic parsing 
system for natural language (NL), based on LR parsing techniques. The system is 
intended to rank the large number of syntactic analyses produced by NL grammars 
according to the frequency of occurrence of the individual rules deployed in each analysis. 
We discuss a fully automatic procedure for constructing an LR parse table from a 
unification-based grammar formalism, and consider the suitability of alternative ... 
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Approaches to natural language processing that use a phrasal lexicon have the advantage 
of easily handling linguistic constructions that might otherwise be extragrammatical. 
However, current phrasal lexicons are often too rigid: their phrasal entries fail to cover the 
more flexible constructions. FLUSH, for Flexible Lexicon Utilizing Specialized and 
Hierarchical knowledge, is a knowledge-based lexicon design that allows broad phrasal 
coverage. 
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Much effort has been put into computational lexicons over the years, and most systems 
give much room to (lexical) semantic data. However, in these systems, the effort put on 
the study and representation of lexical items to express the underlying continuum existing 
in 1) language vagueness and polysemy, and 2) language gaps and mismatches, has 
remained embryonic. A sense enumeration approach fails from a theoretical point of view 
to capture the core meaning of words, let alone relate word meaning ... 
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